Zbuffer And Other Sorting Topics
Whizzter/Woorlic/Deluxe/Razor1911
First of all I want you to mail me with comments on this text as it is one of the first ones I've written and I want to know what I could improve to make it more educational. Mails go to whizzter@enjoy.joint.net.
Also mail me if you know more tricks.
Note: This tutorial is divided into a beginner part and a tips/advanced part. I also assume that you know interpolation, etc. If not, go reading some text on gouraud shading or alike.
Beginner Level - Zbuffer explanation
The advantage of zbuffers is that you never get any sorting errors and in case we are using other "heavy" rendering techniques like RGB gouraud shading, phong, tmapping, bilinear filtering, etc. we can expect a little speed increase by sorting front to back and having less computation/cachemiss per pixel.
For those who didn't know, zbuffers contain the nearest z value for each pixel, the main thing goes something like this.
1. Clear zbuffer before rendering. (This part can be removed, read further down.) 2. Render polygons and interpolate the z value. (Although it's better to interpolate 1/z instead as it's linear.) 3. When rending, if the z value is nearer than the z value already in the buffer then put the pixel and write the z value to the buffer. But if we are further away then skip to the next pixel.
A zbuffer innerloop could look something like this. (In this example we are interpolating 1/z so a bigger value is nearer, and the screen is 320x200 for simplicity.)
for (x=start_x;x<stop_x;x++)
{
if (zval>zbuffer[(y*320)+x)])
{
screen [(y*320)+x]=colour;
zbuffer[(y*320)+x]=zval;
}
zval+=zval_adder;
}
Tips/advanced - Why 1/z interpolation instead of z
Okay, this is my weakest part as I am only 18 years old and haven't had much maths at school. But I'll try to be as correct as possible anyway.
If you ever made a cube with fairly big faces that are texturemapped and without perspective correction, you will know that the middle of the texturemap will start to "bend", this is due to the fact that we have been interpolating the u and v values that belong in "3dspace" in "2dspace".
Your projection formula should be something like this in when going from 3d space to screenspace ("2d space").
x'=tan(fov/2)*x/z
y'=tan(fov/2)*y/z
And then if you don't perspective correct you will just use the u and v values, but if you want it perspective correct you will interpolate u/z, v/z and 1/z (this last component is needed to transform the u/z value back to a u value).
u'=u/z
v'=v/z
z'=1/z
Then for every time you correct (you can do it every pixel but that is not needed in most cases) you'll do something like this.
u=u'/z'
v=v'/z'
Anyhow, u', v' and z' can be correctly interpolated in screenspace, and as you should notice z' equals 1/z.
But aren't we interested in the z value not the 1/z so would we not need to convert the z' to a z value? Well no. This is because the backconversion is done with:
z=1/z'
And as 1 is constant z can always be assumed "relative" with the z', so we can test z' without any problems.
Never clearing the z buffer
As we now hopefully have decided to interpolate 1/z we also know that the 1/z value will always be;
0>=z'>=(1/"nearest z value")
I.e., if we clip at z=0.25 like I do then the max value z' can be is 4.
Wll how does this help?
If you use z' then a greater value will be nearer so you draw if the new value is bigger.
The reason we need to clear the zbuffer is that a pixel in the previous frame might have been nearer than the pixel in this frame.
But if we test the ( z' + framenumber*(1/"nearest z") ) value you will never have a problem as this added value will "push" the z' value forward past the old z' value.
Example with and without "adding":
frame z' value z' value
with add without add
0 0.75 0.75
1 4.749 0.749
error? no yes
Less interpolation of the z value
Gnilk of Noice once posted a message on country.sweden about z buffer matters, he had two ideas.
Let's skip the first idea because we came to the conclusion that it sucked anyway.
The second idea was to just test a median z value for each scanline instead of interpolating it all the time. He said that he used this in the biggest 3d scene in his winning Remedy '99 demo (final version now available at http://www.noice.org). This scene involves flying around a subway and could quite possibly eliminate some sorting errors you might get with normal sorting.
I tested this once but came to the conclusion that it wasn't really such a good idea, it still let sorting errors occur quite often.
Nevertheless the idea was good, so I thought why not do in on 16 pixel spans (i.e. median z value for the whole span). This is because I do a perspective correction every 16th pixel and therefore I already have the z' value interpolated, so I could directly use it for the testing.
This approach got almost perfect results, however at intersections you could notice some errors (although it didn't look that bad) so this approach will work very well on "normal" scenes where the modeller has taken care of intersections within objects.
Speedup gained by zbuffering
Gossip has it that zbuffering is slow. Well that's kinda old nowadays imho even though sbuffers are claimed to be the king of "overdraw" elimination (overdraw is when polygons are drawn several times on the same pixel this causing extra rendering time) and despite the fact that zbuffers take memory (cachemisses), extra checks and interpolation.
Still if your engine uses RGB gouraud shading etc. like mine you will not need much overdraw at all before you start to gain on your zbuffer, provided you do the right things.
1. Sort the polygons front to back, not back to front like normally. This way you won't render the pixel several times in case the polygon is behind anyway.
2. As you know cachemisses for textureloading are quite a big part of the cpu time you spend in the innerloop, skip that and you will gain some speed.
3. Even better, have two innerloops, one that assumes you are in front and then renders and ONLY jumps in case we go behind. And the other of course only jumps if we come in front. The biggest gain is that you don't need any jumps inside the inner and the second loop (the one for behind) should not load textures or perform any shading, just interpolating the needed values, thus gaining lotsa speed.
4. If you do perspective correction every 16th or so pixel you should have some tests before the innerloop to check if the whole 16 pixel span is obscured. The advantages are:
a. Even though we only run the second loop that just interpolates and checks, interpolation AND looping adds extra time.
b. You could just check some pixels and not all, e.g. I only check 8 pixels from the span (every other) and I get an practically as good result as if I had checked each pixel.
c. You can skip some calculation for the whole 16 pixel span, e.g. calculation of the interpolation values for that span.
d. You can easily test if the start or the end of the span is closer to you, and take the nearer value for this test. As usually when this 16th pixel test will help is just with big scenes and then there is big polygons in front and the polygons behind are far away so the bigger value will prolly be behind anyway.
5. I've been pondering on object removal with sbuffers by rendering the boundingbox and checking if anything would be visible. I totally forgot that you could do this with zbuffers also for a while as testing the whole boundingbox with a zbuffer is quite expensive compared to a sbuffer test, however KB reminded me as I read one posting from news.scene.org in the coders group, cheers. Even though this might speed it up will prolly not speed up THAT much unless you have an ideal scene for this.
Solution to the transparent sorting problem with z- and s-buffers
Note: This part assumes that you are using floats. Nowadays most people do this.
As you now know you must sort polys front to back to gain most speed with zbuffering (this also applies for sbuffers). One problem arises now though, you must draw transparent polys after the solid ones AND in back to front order.
This is because when you watch a transparent polygon you will see the solid behind, and drawing front to back both solids and transparent would fuck it all up, figuratively speaking.
So now you must divide the polygons into transperent and solid and then sort them, this will add a bit of sorting overhead (this is especially bad with highpoly scenes).
As you hopefully know already you can use radix sorting if you have all polygons with median z's that are positive (just clip before, not just because of this).
One neat thing is that floats are stored like this:
1 sign bit, 8 exponent bits and 23 bits for the mantissa.
As you noticed above we should have just positive values. This is the key to all.
Now, in case we are doing a radix sort for front to back then setting the highest bit (signbit) would make those polys to be considered behind because the highest bit is set (as radix handles numbers as integers kinda), so setting the sign bit with a simple negate of all transperent polygons would automagically put them to be drawn after the solids, although it isn't really true. Bad thing is that those polygons would also be sorted front to back, meaning that all transparent stuff would be very fucked up. But there is a solution to everything.
You know that 1/1=1, 1/2=0.5, 1/3=~0.33, notice anything? Yes indeed, the values are kinda turned around, bigger becomes smaller and smaller becomes bigger, so this would solve the sorting problem.
If I lost you somewhere on the way, the idea is anyway that before radix sorting when you gather the median z values do something like this (pseudo code):
loop starts:
take median z value
check if polygon is transperent
if true then
median_z=-1/median_z
end of case
store the median_z to the list
loop ends
So, now just radix sort and everything should be correct without several sortings.
Note: I think it was Zeal who said that he was pushing transperent polygons on a "stack". This might be a good solution in SOME cases, especially with sbuffering if you add object removal based on the sbuffer. But in most cases this method oughta be better.
"Correct" sorting instead of zbuffering
One of the main reasons people use zbuffering is because they get sorting bugs (hmm, don't go watching uturn party version now!!).
Imagine a wall that starts quite near you and ends some metres away. If two of the vertices in a triangle that create this wall are near you then anything on the side of this wall in the middle will probably be sorted behind even though it's in front of the wall. An solution would be to use several quicksorts but instead of using the z value you could by using the plane equation and line equation with a few points from each of polygons (maybe test extra if they are near each other?) and that way get correctly sorted polygons. Of course this could however fail sometimes but oughta be nearly as fast as normal sorting (and normal sorting is actually quite cheap considered to rendering time) and still correct.
One thing to be noted, this is just a idea but it should be fast enuff.
Contacting information and other shit
If for any reason you would like to contact me, then...
email: whizzter@enjoy.joint.net phn: +46-(0)922-31113 normal mail: Jonas Lund Kukkola 166 953 91 Haparanda SWEDEN
My personal homepage is at: http://woorlic.hostname.nu/jonaslund/
Oh, BTW. For everybody who saw the crap version of uturn that won the Remedy '99 64k compo, GET THE FINAL VERSION. It kicks the party version quite many times.
http://woorlic.hostname.nu/deluxe/utrnfinl.zip
Why? Well, the last 3d scene in the partyversion was silly, we replaced it. We also added some stuff, so it's quite nice now indeed! And please read the infofile!
Now one thing, I know TONS of people so I will probably forget too many of you, so I just say hello to everybody that I know, you know who you are.
Although for this time I want to thank Adok for getting me to write this article and making a great diskmag.
PS deserves a little extra hello also for really having the nerve to do a regular mag for so MANY unthankful people around!
That oughta be all, I hope you enjoyed this article and hopefully found at least some usable trick. Cya!!